Getting the lead out.
نویسندگان
چکیده
Gene expression signatures consisting of tens to hundreds of genes have been found to be informative for different biological states. Recently, many computational methods have been proposed for biological interpretation of such signatures. However, there is a lack of methods for identifying cell signaling pathways whose deregulation results in an observed expression signature. We present a strategy for identifying such signaling pathways and evaluate the strategy using six human and mouse gene expression signatures. Background Genetic aberrations and variations in cellular processes are usually reflected in the expression levels of many genes. Hence, such alterations can potentially be characterized by their gene expression profiles. Gene expression profiling, in particular DNA microarray analysis, has been widely used in attempts to reveal the underlying mechanisms of many diseases, different developmental stages, cellular responses to different conditions, and many other biological phenomena (for example, [1-3]). Gene expression signatures consisting of tens to hundreds of genes have been associated with many important aspects of the systems studied. To help realize the full potential of gene expression studies, a variety of methods, such as GenMAPP [4], GoMiner [5], DAVID [6] and its desktop version EASE [7], Catmap [8], ArrayXPath [9], and Gene Set Enrichment Analysis (GSEA) [10], have been developed to relate gene expression profiles or signatures to a broad range of biological categories. Although some of these methods include signaling pathways in their categories, their focus has not been on regulatory mechanisms that control the observed gene expression changes. Signal transduction is at the core of many regulatory systems. Cellular functions such as growth, proliferation, differentiation, and apoptosis are regulated by signaling pathways. Appropriate regulation of such pathways is essential for the normal functioning of cells. Cells affected by disease often have one or several signaling pathways abnormally activated or inactivated. For example, cancer is a disease of deregulated cell proliferation and death [11]. To uncover mechanisms underlying cellular phenotypes, therefore, it is crucial to systematically analyze gene expression signatures in the context of signaling pathways. In signal transduction, ligands, usually from outside the cell, interact with receptors on the surface of the cell membrane or with nuclear receptors. These interactions trigger a cascade of biochemical reactions. Proteins called transcription factors (TFs) and cofactors are eventually transported to, or activated in, the nucleus of the cell where they turn transcription of target genes on or off. A signaling pathway is composed of a set of molecular components conveying the signal, such as ligands, receptors, enzymes, TFs, and cofactors. Published: 11 May 2007 Genome Biology 2007, 8:R77 (doi:10.1186/gb-2007-8-5-r77) Received: 6 October 2006 Revised: 19 April 2007 Accepted: 11 May 2007 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2007/8/5/R77 Genome Biology 2007, 8:R77 R77.2 Genome Biology 2007, Volume 8, Issue 5, Article R77 Liu and Ringnér http://genomebiology.com/2007/8/5/R77 When a pathway is activated, the expression levels of the components of the pathway are not necessarily affected. For example, mutation of a TF can change the expression levels of its target genes, without necessarily affecting the expression levels of the TF itself or other components of the pathway. Also, pathway components might not be regulated at the transcriptional level; instead, they are often regulated post-translationally, for example, by phosphorylation. Proteomic data could be used to detect such modifications and be used for pathway analysis, but currently there is a lack of such genome-wide protein data. It has beenpointed out that gene expression signatures may be more reliable indicators of pathway activities than protein data for single components in signaling pathways [12]. Taking all these considerations into account, we reason that the activity of a signaling pathway may currently be best characterized by the expression levels of its target genes. In support of this hypothesis, Breslin et al. [13] have shown the capacity of expression levels of known target genes to reflect pathway activities. However, knowledge about target genes of TFs is far from complete, which hampers accurate prediction of pathway activities. On the other hand, the cis-regulatory motifs to which TFs bind are often better characterized. For organisms with sequenced genomes, these motifs enable genome-wide identification of putative target genes by looking for potential TF binding sites in promoter sequences. Therefore, integrating regulatory motif analysis with pathway information would be a potential approach to break this bottleneck for pathway analysis. Recently, the feasibility ofusing putative binding sites to identify TFs responsible for gene expression signatures of human cancer has been demonstrated [14]. Here we present a strategy to discover activated and inactivated signaling pathways from gene expression signatures by using regulatory motif analysis (Figure 1). To achieve this goal, we began by extracting all signaling pathways in the TRANSPATH database [15], and characterized each pathway by the TFs that mediate it. In all human and mouse promoter sequences, we identified putative binding sites of all the TFs mediating pathways using TF binding site position weight matrices from the TRANSFAC database [16]. Next, we investigated promoters of genes in gene expression signatures for an enrichment of these putative binding sites. Finally, we measured the activity of a pathway in a gene expression signature in terms of the enrichment of binding motifs for the TFs mediating the pathway. Although the use of putative TF binding sites will introduce false-positive target genes for each TF, when the promoters of a set of co-expressed genes are enriched for a putative TF binding site, the gene set is also likely enriched for true target genes. Moreover, our strategy to integrate regulatory motif analysis with knowledge about which TFs act together in pathways further reduces the influence of false-positive targets on the identification of pathways. Our results for six human and mouse gene expression signatures demonstrate the power of our method to identify relevant pathways. We compared our results with those obtained using two widely used methods for relating gene expression profiles to biological categories, EASE [7] and GSEA [10]. For data sets with known pathways activated, we found that our strategy identified the expected pathways whereas EASE and GSEA did not. Hence, our strategy provides additional information complementary to what can be obtained using current methods for biological interpretation of gene expression data. Results and discussion Gene signatures for oncogenic pathways To examine the ability of our method to accurately detect the activity of pathways, we obtained gene signatures for three oncogenic pathways produced by Bild et al. [17]. These signatures consist of genes for which the expression levels in human mammary epithelial cells were highly correlated with the activation status of the oncogenes encoding E2F3 (268 genes), Myc (218 genes), or Ras (304 genes), respectively. These three oncogenic pathways are often activated in solid tumors, including breast tumors, where they contribute to tumor development or progression. Bild et al. verified the activation status of each pathway using various biochemical measurements and demonstrated that the expression patterns in each signature were specific to each pathway. Hence, these signatures are ideal for evaluating our strategy to identify activated pathways. The statistically significant pathways identified by our method for the three gene signatures are shown in Table 1. The E2F pathway was extremely significant for the E2F3 gene signature. E2F3 is a member of the E2F TF family (E2Fs). E2Fs can induce cell cycle G1 to S transition and activate many genes encoding proteins essential for DNA replication [18,19]. E2F1, another member of the E2Fs, can form dimers with DP-1, making this activation more efficient [20]. Our method identified both E2F1 (P < 0.001) and DP-1 (P < 0.001) as significant TFs for this signature. TRANSPATH does not contain a strictly defined Myc pathway, but it includes three pathways containing c-Myc as a TF: the epidermal growth factor (EGF), Notch, and mitogen-activated protein kinase (MAPK) pathways. We identified c-Myc as a significant TF for this signature (P < 0.001), and both the EGF and the Notch pathways were found to be significant. The MAPK pathway was not found to be significant. The only significant TF found for the MAPK pathway was c-Myc, perhaps suggesting that induction of c-Myc is not sufficient to deregulate this pathway. Consistent with this suggestion, it has been shown that elevated c-Myc expression is not sufficient for tumorigenesis in human mammary epithelial cells [21]. Interestingly, we also found the hypoxia-inducible pathway HIF-1 significant. Studies have shown that HIF-1 is activated in many tumors, including breast cancer [22], as a Genome Biology 2007, 8:R77 http://genomebiology.com/2007/8/5/R77 Genome Biology 2007, Volume 8, Issue 5, Article R77 Liu and Ringnér R77.3
منابع مشابه
The Impact of Packaged Food on Children\'s Health: Strategies to Overcome
Background: Packaged foods are often high in sugar, salt, fat, additives, preservatives, artificial colours and flavours and low in nutrients. These factors may lead to weight gain, delayed growth and development, type 2 diabetes, damage to the teeth, food poisoning, and many other health problems. This short commentary has critically examined the effects of packaged food on children's health a...
متن کاملبررسی همگام سازی در مدل کوراموتو بر روی شبکه منظم
Stable solutions of the Kuramoto model on a regular network are investigated. It is shown that there are two stable states: a completely synchronized state with an order parameter equal to one and a completely incoherent state with an order parameter equal to zero. Also, the situation that could lead to the order parameter just equal to one is found out. Furthermore, it is shown that the phase ...
متن کاملLeaching of Lead and Zinc from a Low-Grade Oxide Ore in Citric Acid Media
Leaching of a low-grade lead and zinc oxide ore was studied by an organic reagent, citric acid. The quadratic mathematical models were developed for the relationship among the influential parameters and lead and zinc recoveries. Leaching rate was strongly affected by the interactive effects of factors. It was also found that the quadratic effects of stirring speed and linear<e...
متن کاملThe end of mandatory retirement for tenured faculty.
842 Get-the-Lead-Out Guru Challenged 845 Greenhouse Bandwagon Rolls On 846 Jawboning Prehistory Has Challenger Knocked Out Galileo? 847 Policy-Making: Getting Better Data 848 Small Is Beautiful for University Space Outfit a In Search of a Free Launch 850 Post-Mortem on Storb Resignation A Changing of the Guards A Reprieve for MIT's Magnet Lab 851 Briefings: Gene Zapping m The Case of the Missin...
متن کاملBit-(Parallelism)2: Getting to the Next Level of Parallelism
We investigate the problem of getting to a higher instructionlevel parallelism in string matching algorithms. In particular, starting from an algorithm based on bit-parallelism, we propose two flexible approaches for boosting it with a higher level of parallelism. These approaches are general enough to be applied to other bit-parallel algorithms. It turns out that higher levels of parallelism l...
متن کاملA Mathematical Model to Optimize Allocation Sequence in Dispatching Problem
Truck-Shovel fleet, as the most common transportation system in open-pit mines, has a significant part of mining costs, for which optimal management can lead to substantial cost reductions. Among the available dispatch mathematical models, the multi-stage approach is well suited for allocating trucks to respected shovels in a dynamic dispatching program. However, with this kind of modeling sequ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Environmental Health Perspectives
دوره 104 شماره
صفحات -
تاریخ انتشار 1996